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Abstract 

We review the reasoning underlying two approaches to combination of sen- 
sory uncertainties. First approach is noncommittal, making no assump- 
tions about properties of uncertainty or parameters of stimulation, as in 
Gepshtein, Tyukin, and Albright (2010). Then we explain the relationship 
^ | \ between this approach and the one commonly used in modeling "higher 

q ' level" aspects of sensory systems, such as in visual cue integration, where 

assumptions are made about properties of stimulation. The two approaches 
follow similar logic, except in one case maximal uncertainty is minimized 
and in the other minimal CERTAINTY is maximized. Then we demonstrate 
how optimal solutions are found to the problem of resource allocation under 
uncertainty. 
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1. Combination of uncertainties 
1.1. Noncommittal approach 
Let the stimulus be an integrable function of one variable I(x) that depends on two aspects of stimulation: 

• Stimulus location on x, where x can be space or time, the "location" indicating, respectively where 
or when stimulation occurred. 

• Stimulus content on f x , where f x can be spatial or temporal frequency of stimulus modulation. 

We consider a sensory system equipped with many measuring devices, each able to estimate both stimulus 
location and content from I(x). We assume that the error of estimation is a random variable with 
probability density p(x, f). 

It is sometimes assumed that sensory systems know p(x, /): a case we review in the next section. 
But in general we do not know p(x, /); we only know (or guess) some of its properties, such as its mean 
value and variance. In particular, let 



Px(x) = / p(x,f)df, 

J f (si) 

Pf(f) = / p(x,f)dx 



be the (marginal) means of p(x,f) on dimensions x and f x . Sensory systems can optimize their perfor- 
mance with this minimal knowledge, as follows. 

To reduce the chances of making gross errors, we use the following strategy. We find the condition 
of minimal uncertainty against the profile of maximal uncertainty, i.e., using a minimax approach (von 
Neumann, 1928; Luce &; Raiffa, 1957). We do so in two steps. First we find such p x {x) and Pf(f) for 
which measurement uncertainty is maximal. Then we find the condition at which the function of maximal 
uncertainty has the smallest value: the minimax point. 

We evaluate maximal uncertainty using the well-established definition of entropy (Shannon, 1948): 

H(X,F) = - J p(x,f)logp(x,f)dxdf. 

Recall that Shannon's entropy is sub-additive: 

H(X,F)<H(X) + H(F) = H*(X,Y), (S2) 

where 

H(X) = - p x (x) log p x (x)dx, 
H(F) = - Jp f (f) hg P f(f)df. 
Therefore, we can say that the uncertainty of measurement cannot exceed 

H*(X,F) =- p x (x) log p x (x)dx 

J . (S3) 

Pf{f)logp f {f)df. 

Eq. S3 is the "envelope" of maximal measurement uncertainty: a "worst-case" estimate. 
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By the Boltzmann theorem on maximum-entropy probability distributions (Cover & Thomas, 
2006), the maximal entropy of probability densities with fixed means and variances is attained when 
the functions are Gaussian. Then, maximal entropy is a sum of their variances (Cover & Thomas, 2006). 
We obtain 

2 /o„2 



p x (x) = -^e- x2 / 2 ° 



Pf(f) 



J_."/ 2 /2^ 



<T^v2vr 

where a x and at are the standard deviations. And the maximal entropy is simply: 

H = a 2 x + a). (S4) 

That is, when variances are unknown, maximal uncertainty of measurement is a sum of variances of 
measurement components. 

This is the method used by Gepshtein et al. (2007, 2010) in derivations of joint uncertainty and 
composite uncertainty functions. 1 The authors then found the optimal conditions by looking for minimal 
values of the uncertainty functions. 

1.2. Top-down approach 

Now we assume the system enjoys some knowledge of stimulation, so we can use likelihood as a measure 
of uncertainty. Suppose we want to derive a combined estimate z from two estimates x and / of some 
parameter of stimulation. We assume that likelihood functions P(z\x,f), P x (z\x), and Pf(z\f) are 
continuous, differentiable, and known. Let us first assume that likelihoods are separable: 

P(z\x,f) = P x (z\x)P f (z\f). (S5) 

Then, the most likely value of z is 

z* = argmaxP(z|x, /) = argmax[logP x (z|x) + log Pt(z\f)]. 

z z 

We can use the logarithmic transformation because it is a strictly monotone continuous function on 
(0, oo), and hence it does not change maxima of continuous functions. 

It is commonly assumed that P x {z\x) and Pf(z\f) are Gaussian functions, or that they are well 
approximated by Gaussian functions. For example, Yuille and Biilthoff (1996) assumed that cubic and 
higher-order terms of the Taylor expansion of log P x (z\x) can be neglected, which is equivalent to the 
assumption of Gaussianity. (We return to this assumption, and also the assumption of separability in a 
moment.) Then 

P x (z\x)=c x e^ z - Z ^ 2 / 2 ^, 

P f (z\f) = c f e^ z -^ 2 ^l c x ,c f ER >0 

log P x (z\x) + log P f (z\f) = 

log c x + log Cf - -^(z- z x ) 2 - -^{z-Zff. 
x f 



and 



For simplicity, Gepshtein et al. (2010) use intervals of measurement, rather than interval variances, as estimates of 
component uncertainties. 
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The latter expression is maximized when its first derivative over z is zero. Hence 

Z ~ W + °f) Vt** + */ 7 (S6) 

2 (ajz x + o*Zf), 



°x + Of 



which is the familiar weighted-average rule of cue combination (Cochran, 1937; Maloney & Landy, 1989; 
Clark & Yuille, 1990; Landy, Maloney, Johnsten, & Young, 1995; Yuille & Biilthoff, 1996). In general, 
when the number of measurements is greater than two, the combination rule of Eq. S6 becomes 



y^E^Il4 ( S7 ) 



where zi are such that individual likelihood functions attain their maxima at Zj. 

Why is the assumption common that likelihood functions have the simple form of Eq. S5, i.e., are 
separable and Gaussian? An answer follows from the argument we presented in the previous section. 
Suppose that one seeks to estimate the likelihood function when its shape is unknown. We saw in the 
previous section that the least certain estimate is the likelihood function for which the entropy is maximal. 
Hence, by sub-additivity of entropy (Eq. S2), the least certain estimate of P(z\x,f) is 

P(z\x,f) = P x (z\x)P f (z\f), 

as in Eq. S5. Moreover, if the mean values and variances of P x {z\x) and Pf(z\f) are fixed, then the 
likelihood functions must be Gaussian, by the same argument. Indeed, separable Gaussian likelihood 
functions are the least certain estimates. 

2. Resource allocation 

In Gepshtein et al. (2010) we asked how sensory system ought to allocate their resources in face of 
uncertainties inherent in measurement and stimulation. We approached this problem in two steps. First, 
we combined all uncertainties in uncertainty functions: comprehensive descriptions of how quality of 
measurement varied across conditions of measurement. Second, we proposed how limited resources are 
to be allocated given the uncertainty functions. Here we illustrate the second step in more detail, using 
the approach of constrained optimization. 

A key requirement of allocation is to optimize reliability (reduce uncertainty) of measurement by 
many sensors. Satisfying this requirement alone makes the system place all sensors where conditions of 
measurement are least uncertain, leaving the system unprepared for sensing the stimuli that are useful 
but whose uncertainty is high. To prevent such gaps of allocation, we propose that minimal requirements 
should be twofold: 

A. Reliability: Prefer low uncertainty. 

B. Comprehensiveness: Measure all useful stimuli. 
We formalize these requirements as follows. Let: 

• A € [a, b] C R be the size of measuring device ("receptive field"), 

• U(A) : R — > R be the uncertainty function associated with measuring devices of different size, and 

• r(A) : R — > R>o be the amount of resources allocated across A (Eq. the number of cells with 
receptive fields of size A). 
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Encouraging reliability. By requirement A, the system is penalized for allocating resources where 
uncertainty is high. This is achieved, for example, when the cost for placing resources at A is 

hU(A)r(A), 

where k\ is a positive constant. The higher the uncertainty at A, or the larger the amount of resources 
allocated to A, the higher the cost. Hence the total cost of allocation is: 

■h= I k 1 U(A)r(x)dA. (S8) 

J a 

Functional J\ is minimal when all the detectors are allocated to (i.e., have the size of) A at the lowest 
value of 17(A) . 



Encouraging comprehensiveness. By requirement B, the system is penalized for failing to measure 
particular stimuli. This is achieved, for example, when the allocation cost is 

k-2 



r(A)' 
where ki is a positive constant. The total penalty of this type is: 



b fe, 



J 2 = I ^rdx. (S9) 

Ja r{x) 

Functional Ji is large (infinite) when all resources are allocated to a small vicinity (one point). J<i is 
small when r(A) are large for all A. 



Prescription of allocation. The total penalty of requirements A and B is 

J= ( k l U{A)r{A) + ^-dA. (S10) 

Ja r{A) 

Using standard tools of calculus of variations (e.g., Elsgolc, 2007) we find such function r(A) that mini- 
mizes J. In particular, we consider a variation of J with respect to changes of r(A): 

fb 



= [ (huw - p^j) MAMA. 



Because at optimal r(A) the value of 5 J is zero for all 5r(A), we deduce that conditions of optimality 

are: 

k „ , k 2 



u ^-7HA)=°' k = t (S11) 



In other words 



r(A) = iukr (S12) 



This r(A) is the prescription of optimal allocation. 
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Amount of resources. If the total amount or resources in the system is known and is C: 

r(A) = C, (S13) 



/ 



then we may modify coefficients k\ and k 2 in Eq. S10, to make Eq. S10 consistent with Eq. S13. Or, 
we may use the method of Lagrange multipliers, looking for conditions where variation of the following 
functional vanishes: 

J = j kiU(A)r(A) + ^dA + \(J r(A) - c\ . (S14) 

We find Lagrange multiplier A at which Eq. S13 is satisfied. The solution (using a method similar to that 
used for solving Eq. Sll) is: 



ww + *> - ^ = o - ^> = Jwttrx (S15) 



provided that 



k 2 



hU{A) + X 



dA = C. 



The latter constraint is used to find A in Eq. S15. In either case, the shape of the optimal allocation 
function r(A) is determined by U(A), such that allocation function is maximal where U(A) is minimal. 
The formulation in Eq. S14 has an advantage. It allows one to derive optimal prescriptions under changes 
in the amount of resources allocated to the task, such as in selective attention. 

Generalizations. In a multidimensional case, when A represents several variables (e.g., spatial and 
temporal extents of receptive fields, S and T), and [/(•) is a function of many variables, the prescription 
is 

r(s,t) = 



U(s,t) 



Using the method of Lagrange multiplies, one can show that a similar result is obtained when the 
costs of reliability and comprehensibleness (Eqs. S8-S9) have more general formulations: 



J\= I k 1 U(A)r p (A)dA, 

J a 



f b 1 

•h = / k 2 ——dA, p,q>l, 
Ja ri{A) 

The previously derived prescription holds: allocate maximal amount of resources to conditions of minimal 
uncertainty. 
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